Steps taken before implementing PCA:

  1. Replaced < LOD with with the minimum observed value for each variable divided by 2. Note: after this calculation, some values go to infinity. They have been set to 0.
  2. Checked missingness in the data.
  3. Removed zero variance variables.
  4. Scaled and centered all variables to ensure that the criterion for finding linear combinations of the predictors is based on how much variation they explain and therefore improve numerical stability.

Notes about the following results:

  1. Number of components for each dataset has been chosen based on their respective scree plots.
  2. Importance has been calculated based on the contribution of a variable to each component. For example, if all variables would contribute equally to each component they would each take up 1/ncol(data). So, any variable that contributes more than 1/ncol(data) to a component can be considered as an important contributor to that component. The vertical line in each plot represents this threshold.
  3. Output data from PCA merged with bmid has been saved to “/data/KI/imic/results/pcaData/”.

ELICIT

Biocrates normalized: targeted metabolomics data

Scree plot

Top 10 contributors to each component based on importance

Top 10 contributors to each component by value to see direction

Metabolite indicators: sums and ratios

Scree plot

Top 10 contributors: importance

Top 10 contributors: value

VITAL

Biocrates normalized: targeted metabolomics data

Scree plot

Top 10 contributors: importance

Top 10 contributors: value

Metabolite indicators: sums and ratios

Scree plot

Top 10 contributors: importance

Top 10 contributors: value

Proteomics data

Scree plot

Top 10 contributors: importance

Top 10 contributors: value